AITopics

2605.06826

Country: Asia > Middle East > UAE (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningApr-28-2026

Nearly Optimal Subdata Selection

Yang, Min, Zheng, Wei, Stufken, John, Chang, Ming-Chung, Tian, Ting, Wang, Xueqin

When, in terms of the number of data points, the size of a dataset exceeds available computing resources, or when labeling is expensive, an attractive solution consists of selecting only some of the data points (subdata) for further consideration. A central question for selecting subdata of size $n$ from $N$ available data points is which $n$ points to select. While an answer to this question depends on the objective, one approach for a parametric model and a focus on parameter estimation is to select subdata that retains maximal information. Identifying such subdata is a classical NP-hard problem due to its inherent discreteness. Based on optimal approximate design theory, we develop a new methodology for information-based subdata selection, resulting in subdata that approaches the optimal solution. To achieve this, we develop a novel algorithm that applies to a general model, accommodates arbitrary choices of $N$ and $n$, and supports multiple optimality criteria, and we prove its convergence. Moreover, the new methodology facilitates an assessment of the efficiency of subdata selected by any method by obtaining tight lower and upper bounds for the efficiency. We show that the subdata obtained through the new methodology is highly efficient and outperforms all existing methods.

artificial intelligence, machine learning, subdata, (16 more...)

2604.2393

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Neural Information Processing SystemsMar-17-2026, 07:28:06 GMT

k*-Nearest Neighbors: From Global to Local

The weighted k-nearest neighbors algorithm is one of the most fundamental non-parametric methods in pattern recognition and machine learning. The question of setting the optimal number of neighbors as well as the optimal weights has received much attention throughout the years, nevertheless this problem seems to have remained unsettled. In this paper we offer a simple approach to locally weighted regression/classification, where we make the bias-variance tradeoff explicit. Our formulation enables us to phrase a notion of optimal weights, and to efficiently find these weights as well as the optimal number of neighbors efficiently and adaptively, for each data point whose value we wish to estimate. The applicability of our approach is demonstrated on several datasets, showing superior performance over standard locally weighted methods.

artificial intelligence, machine learning, neural information processing system 29, (8 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Neural Information Processing SystemsFeb-17-2026, 05:16:42 GMT

Kernel Quadrature with Randomly Pivoted Cholesky Ethan N. Epperly and Elvira Moreno

This paper presents new quadrature rules for functions in a reproducing kernel Hilbert space using nodes drawn by a sampling algorithm known as randomly pivoted Cholesky. The resulting computational procedure compares favorably to previous kernel quadrature methods, which either achieve low accuracy or require solving a computationally challenging sampling problem.

artificial intelligence, machine learning, rpc holesky, (17 more...)

Country:

North America > United States > California > Los Angeles County > Pasadena (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)

Industry: Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Neural Information Processing SystemsFeb-11-2026, 04:51:51 GMT

f9028faec74be6ec9b852b0a542e2f39-Paper.pdf

classifier, convergence rate, msk -nn, (15 more...)

Country:

North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceDec-4-2025

ALARM: Automated MLLM-Based Anomaly Detection in Complex-EnviRonment Monitoring with Uncertainty Quantification

Zhang, Congjing, Lin, Feng, Zhao, Xinyi, Guo, Pei, Li, Wei, Chen, Lin, Zhao, Chaoyue, Huang, Shuai

The advance of Large Language Models (LLMs) has greatly stimulated research interest in developing multi-modal LLM (MLLM)-based visual anomaly detection (VAD) algorithms that can be deployed in complex environments. The challenge is that in these complex environments, the anomalies are sometimes highly contextual and also ambiguous, and thereby, uncertainty quantification (UQ) is a crucial capacity for an MLLM-based VAD system to succeed. In this paper, we introduce our UQ-supported MLLM-based VAD framework called ALARM. ALARM integrates UQ with quality-assurance techniques like reasoning chain, self-reflection, and MLLM ensemble for robust and accurate performance and is designed based on a rigorous probabilistic inference pipeline and computational process. Extensive empirical evaluations are conducted using the real-world smart-home benchmark data and wound image classification data, which shows ALARM's superior performance and its generic applicability across different domains for reliable decision-making.

data mining, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2512.03101

Genre: Research Report (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Consumer Health (0.68)
Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (0.46)
Health & Medicine > Therapeutic Area > Musculoskeletal (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Li, Jingyang, Lyu, Zhongyuan

HeteroJIVE: Joint Subspace Estimation for Heterogeneous Multi-View Data

arXiv.org Machine LearningDec-3-2025

Many modern datasets consist of multiple related matrices measured on a common set of units, where the goal is to recover the shared low-dimensional subspace. While the Angle-based Joint and Individual Variation Explained (AJIVE) framework provides a solution, it relies on equal-weight aggregation, which can be strictly suboptimal when views exhibit significant statistical heterogeneity (arising from varying SNR and dimensions) and structural heterogeneity (arising from individual components). In this paper, we propose HeteroJIVE, a weighted two-stage spectral algorithm tailored to such heterogeneity. Theoretically, we first revisit the ``non-diminishing" error barrier with respect to the number of views $K$ identified in recent literature for the equal-weight case. We demonstrate that this barrier is not universal: under generic geometric conditions, the bias term vanishes and our estimator achieves the $O(K^{-1/2})$ rate without the need for iterative refinement. Extending this to the general-weight case, we establish error bounds that explicitly disentangle the two layers of heterogeneity. Based on this, we derive an oracle-optimal weighting scheme implemented via a data-driven procedure. Extensive simulations corroborate our theoretical findings, and an application to TCGA-BRCA multi-omics data validates the superiority of HeteroJIVE in practice.

heterogeneity, heterojive, matrix, (17 more...)

2512.02866

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.93)

arXiv.org Artificial IntelligenceDec-3-2025

DiCaP: Distribution-Calibrated Pseudo-labeling for Semi-Supervised Multi-Label Learning

Han, Bo, Li, Zhuoming, Wang, Xiaoyu, Hou, Yaxin, Liu, Hui, Hou, Junhui, Jia, Yuheng

Semi-supervised multi-label learning (SSMLL) aims to address the challenge of limited labeled data in multi-label learning (MLL) by leveraging unlabeled data to improve the model's performance. While pseudo-labeling has become a dominant strategy in SSMLL, most existing methods assign equal weights to all pseudo-labels regardless of their quality, which can amplify the impact of noisy or uncertain predictions and degrade the overall performance. In this paper, we theoretically verify that the optimal weight for a pseudo-label should reflect its correctness likelihood. Empirically, we observe that on the same dataset, the correctness likelihood distribution of unlabeled data remains stable, even as the number of labeled training samples varies. Building on this insight, we propose Distribution-Calibrated Pseudo-labeling (DiCaP), a correctness-aware framework that estimates posterior precision to calibrate pseudo-label weights. We further introduce a dual-thresholding mechanism to separate confident and ambiguous regions: confident samples are pseudo-labeled and weighted accordingly, while ambiguous ones are explored by unsupervised contrastive learning. Experiments conducted on multiple benchmark datasets verify that our method achieves consistent improvements, surpassing state-of-the-art methods by up to 4.27%.

artificial intelligence, machine learning, unlabeled data, (16 more...)

arXiv.org Artificial Intelligence

2511.20225

Country: Asia > China (0.29)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsNov-21-2025, 14:27:50 GMT

k*-Nearest Neighbors: From Global to Local

electronic proceedings, name change, neighbor, (2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

arXiv.org Machine LearningNov-4-2025

Learning an Efficient Optimizer via Hybrid-Policy Sub-Trajectory Balance

Guan, Yunchuan, Liu, Yu, Zhou, Ke, Li, Hui, Jia, Sen, Shen, Zhiqi, Wang, Ziyang, Zhang, Xinglin, Chen, Tao, Hwang, Jenq-Neng, Li, Lei

Recent advances in generative modeling enable neural networks to generate weights without relying on gradient-based optimization. However, current methods are limited by issues of over-coupling and long-horizon. The former tightly binds weight generation with task-specific objectives, thereby limiting the flexibility of the learned optimizer. The latter leads to inefficiency and low accuracy during inference, caused by the lack of local constraints. In this paper, we propose Lo-Hp, a decoupled two-stage weight generation framework that enhances flexibility through learning various optimization policies. It adopts a hybrid-policy sub-trajectory balance objective, which integrates on-policy and off-policy learning to capture local optimization policies. Theoretically, we demonstrate that learning solely local optimization policies can address the long-horizon issue while enhancing the generation of global optimal weights. In addition, we validate Lo-Hp's superior accuracy and inference efficiency in tasks that require frequent weight updates, such as transfer learning, few-shot learning, domain generalization, and large language model adaptation.

artificial intelligence, machine learning, natural language, (19 more...)

2511.00543

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(12 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)